Conclusion: Plastic consumption is best predicted by population and urban population after scaled log transformation

1. Relative changes over time

1.1 Consumption over time

1.2 Waste over time

1.3 Stock over time

2. Consumption vs. population

2.1 Total consumption vs. population 1950-2017

2.2 It looks like the plastic consumption of all countries grows with popluation exponentially (linearly on a log-log scale), except every country started differently in 1950. Here is a plot of plastic consumption of countries in 1950 vs. population:

log-log regression of total plastic consumption vs. population in 1950:

## 
## Call:
## lm(formula = log1p(consumption[1, 7, ]) ~ log1p(socioeconomic[1, 
##     1, ]))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6024 -0.5795 -0.0054  0.6270  1.8318 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -4.29966    0.75872  -5.667 1.39e-06 ***
## log1p(socioeconomic[1, 1, ])  0.70699    0.08213   8.608 1.21e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9621 on 40 degrees of freedom
## Multiple R-squared:  0.6494, Adjusted R-squared:  0.6407 
## F-statistic:  74.1 on 1 and 40 DF,  p-value: 1.209e-10

If logY = alogP + b, and logY0 = mlogP0 + n, we can get logYn = alogPn + (a-m)logP0 + (b-n). The results are not good, probably because these countries are two clusters (see above figure)

## 
## Call:
## lm(formula = logYn ~ logPn + logP0, na.action = na.exclude)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7448 -1.0831  0.0005  0.8385  4.4581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.76958    0.20193   8.763   <2e-16 ***
## logPn        3.99850    0.14479  27.616   <2e-16 ***
## logP0       -0.17666    0.01981  -8.916   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.447 on 2513 degrees of freedom
## Multiple R-squared:  0.2341, Adjusted R-squared:  0.2335 
## F-statistic:   384 on 2 and 2513 DF,  p-value: < 2.2e-16

consumption vs. pop normalized by 1950

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8605 -0.3927 -0.0772  0.3256  5.1536 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.659e-16  1.233e-02    0.00        1    
## x           7.480e-01  1.242e-02   60.21   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.659 on 2854 degrees of freedom
## Multiple R-squared:  0.5595, Adjusted R-squared:  0.5594 
## F-statistic:  3626 on 1 and 2854 DF,  p-value: < 2.2e-16

Clustering using K-Means for two clusters

## [1] "2 clusters with size of:"
## [1] 38  4
##        Republic of Korea                    Japan                    India 
##                        1                        1                        1 
## United States of America                   Brazil                    China 
##                        1                        1                        1 
##                   Turkey                   Canada                  Belgium 
##                        1                        1                        1 
##                 Bulgaria                  Czechia                  Denmark 
##                        1                        1                        2 
##                  Germany                  Estonia                  Ireland 
##                        1                        2                        2 
##                   Greece                    Spain                   France 
##                        1                        1                        1 
##                    Italy                   Cyprus                   Latvia 
##                        1                        1                        1 
##                Lithuania               Luxembourg                  Hungary 
##                        1                        1                        1 
##                    Malta              Netherlands                  Austria 
##                        1                        1                        1 
##                   Poland                 Portugal                  Romania 
##                        1                        1                        1 
##                 Slovenia                 Slovakia                  Finland 
##                        2                        1                        1 
##                   Sweden           United Kingdom                  Croatia 
##                        1                        1                        1 
##                  Iceland                   Norway          North Macedonia 
##                        1                        1                        1 
##                    Egypt                   Mexico     China, Hong Kong SAR 
##                        1                        1                        1

remove four outliers from clustering and plot consumption vs. pop (1950=1). There is always a small number of countries with hockystick-like curves. Standalize both consumption and population might be better. See later. Also Latvia seems to be an outlier (21)

Range between (0,1) after normalization and normalization after log. Normalization after log is better, R2 = 0.7+

## 
## Call:
## lm(formula = logCspt ~ logPop, na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50349 -0.10973  0.00066  0.08226  0.77572 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.125588   0.007099   17.69   <2e-16 ***
## logPop      0.836460   0.010722   78.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.16 on 2514 degrees of freedom
## Multiple R-squared:  0.7077, Adjusted R-squared:  0.7076 
## F-statistic:  6086 on 1 and 2514 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = as.vector(scale(log10(consumption[, 7, ])[, -c(20, 
##     23, 25, 37, 38)])) ~ as.vector(scale(log10(socioeconomic[, 
##     1, ]))[, -c(20, 23, 25, 37, 38)]), na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.45757 -0.24113 -0.00673  0.19729  2.81870 
## 
## Coefficients:
##                                                                           Estimate
## (Intercept)                                                             -1.349e-15
## as.vector(scale(log10(socioeconomic[, 1, ]))[, -c(20, 23, 25, 37, 38)])  8.825e-01
##                                                                         Std. Error
## (Intercept)                                                              9.310e-03
## as.vector(scale(log10(socioeconomic[, 1, ]))[, -c(20, 23, 25, 37, 38)])  9.379e-03
##                                                                         t value
## (Intercept)                                                                 0.0
## as.vector(scale(log10(socioeconomic[, 1, ]))[, -c(20, 23, 25, 37, 38)])    94.1
##                                                                         Pr(>|t|)
## (Intercept)                                                                    1
## as.vector(scale(log10(socioeconomic[, 1, ]))[, -c(20, 23, 25, 37, 38)])   <2e-16
##                                                                            
## (Intercept)                                                                
## as.vector(scale(log10(socioeconomic[, 1, ]))[, -c(20, 23, 25, 37, 38)]) ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.467 on 2514 degrees of freedom
## Multiple R-squared:  0.7789, Adjusted R-squared:  0.7788 
## F-statistic:  8855 on 1 and 2514 DF,  p-value: < 2.2e-16

as shown above, normalization after log transformation seems to be able to get a good fit

2.3 By sectors (plastic consumption (kt) vs. population (1,000))

3 Total plastic consumption/person

3.1 Total plastic consumption/person vs. population in 1950:

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000000 0.0001325 0.0005810 0.0010162 0.0013792 0.0053893

3.2 Plastic consumption/person vs. population from 1950-2017:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.002344 0.020643 0.045596 0.066930 1.095443

3.3 scaled-loged plastic consumption/person vs. population from 1950-2017

## 
## Call:
## lm(formula = as.vector(slogcsmpperson) ~ as.vector(slogpop), 
##     na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.68410 -0.26193 -0.00778  0.22293  2.87805 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -1.382e-15  9.760e-03    0.00        1    
## as.vector(slogpop)  8.700e-01  9.832e-03   88.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4895 on 2514 degrees of freedom
##   (340 observations deleted due to missingness)
## Multiple R-squared:  0.757,  Adjusted R-squared:  0.7569 
## F-statistic:  7830 on 1 and 2514 DF,  p-value: < 2.2e-16

4. Consumption vs. GDP

nothing interesting

5. consumption vs. urbanization

5.1 consumption vs. pop after log transformation then scale

## 
## Call:
## lm(formula = as.vector(slogcsmp) ~ as.vector(slogurbanpop), na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.44687 -0.17856 -0.00119  0.18494  1.73333 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -3.249e-16  7.073e-03     0.0        1    
## as.vector(slogurbanpop)  9.340e-01  7.125e-03   131.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3548 on 2514 degrees of freedom
##   (340 observations deleted due to missingness)
## Multiple R-squared:  0.8724, Adjusted R-squared:  0.8723 
## F-statistic: 1.718e+04 on 1 and 2514 DF,  p-value: < 2.2e-16

5.2 scaled-loged consumption vs. urbanization

## 
## Call:
## lm(formula = as.vector(slogcsmp) ~ as.vector(socioeconomic[, 
##     6, ]), na.action = na.exclude)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9827 -0.5487  0.1436  0.6226  2.1699 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -1.40620    0.06360  -22.11   <2e-16 ***
## as.vector(socioeconomic[, 6, ])  2.26459    0.09824   23.05   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9022 on 2514 degrees of freedom
##   (340 observations deleted due to missingness)
## Multiple R-squared:  0.1745, Adjusted R-squared:  0.1742 
## F-statistic: 531.4 on 1 and 2514 DF,  p-value: < 2.2e-16

5.3 scaled-loged consumption/person vs. urban population

## 
## Call:
## lm(formula = as.vector(slogcsmpperson) ~ as.vector(slogurbanpop), 
##     na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.78228 -0.19990 -0.00235  0.20322  1.79857 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -3.742e-16  7.505e-03     0.0        1    
## as.vector(slogurbanpop)  9.253e-01  7.561e-03   122.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3765 on 2514 degrees of freedom
##   (340 observations deleted due to missingness)
## Multiple R-squared:  0.8563, Adjusted R-squared:  0.8562 
## F-statistic: 1.498e+04 on 1 and 2514 DF,  p-value: < 2.2e-16